Skip to content

ct: add topic manifest uploader#29470

Merged
andrwng merged 6 commits intoredpanda-data:devfrom
andrwng:ct-topic-manifest-uploader
Feb 5, 2026
Merged

ct: add topic manifest uploader#29470
andrwng merged 6 commits intoredpanda-data:devfrom
andrwng:ct-topic-manifest-uploader

Conversation

@andrwng
Copy link
Contributor

@andrwng andrwng commented Jan 30, 2026

This adds a topic manifest uploader for topic manifests. The topic manifest uploads follow cloud topic partition 0 leadership, and happen when there is a topic properties change. This follows the behavior of tiered storage topic manifest uploads, which happen in the NTP archiver of partition 0.

While they won't be used for WCR, topic manifests will be used for read replicas and topic mounting.

I wanted to reuse a good amount of code in tiered storage for topic manifests uploads and download. To that end, a good chunk of this PR is just pulling cloud_storage things into their own bazel libraries.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x

Release Notes

  • None

Copilot AI review requested due to automatic review settings January 30, 2026 08:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a topic manifest uploader for cloud topics, enabling topic manifest uploads when there are topic property changes or leadership changes on partition 0. The implementation follows the existing tiered storage pattern and reuses code by extracting cloud_storage components into separate Bazel libraries.

Changes:

  • Adds topic_manifest_upload_manager to handle topic manifest uploads following partition 0 leadership
  • Refactors cloud_storage components into separate Bazel libraries for better reusability
  • Extracts topic_path_provider from remote_path_provider to share path computation logic

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/v/cloud_topics/topic_manifest_upload_manager.h Defines the manager interface for handling topic manifest uploads
src/v/cloud_topics/topic_manifest_upload_manager.cc Implements upload loop with retry logic and leadership tracking
src/v/cloud_topics/tests/topic_manifest_uploader_test.cc Adds integration tests for manifest upload scenarios
src/v/cloud_storage/topic_manifest_uploader.h Defines the uploader component with error handling
src/v/cloud_storage/topic_manifest_uploader.cc Implements manifest upload with proper exception handling
src/v/cloud_storage/topic_path_provider.h Extracts topic-level path computation into reusable component
src/v/cloud_storage/topic_path_provider.cc Implements topic path generation logic
src/v/cloud_storage/remote_path_provider.h Refactors to inherit from topic_path_provider
src/v/cloud_storage/remote_path_provider.cc Updates to use inherited topic path methods
src/v/cloud_topics/manager/manager.h Adds callback registration for property changes
src/v/cloud_topics/manager/manager.cc Implements property change notification handling
src/v/cloud_topics/app.h Adds topic_manifest_upload_mgr service
src/v/cloud_topics/app.cc Wires up the upload manager with notifications
src/v/cluster/archival/ntp_archiver_service.cc Updates to use new topic_manifest_uploader
src/v/cloud_storage/BUILD Splits monolithic library into granular components
src/v/cloud_topics/BUILD Adds dependencies for new upload manager

@andrwng andrwng force-pushed the ct-topic-manifest-uploader branch 8 times, most recently from 6bdb3de to 696bcbe Compare February 3, 2026 00:31
@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#79988
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
TestReadReplicaService test_identical_lwms_after_delete_records {"cloud_storage_type": 1, "partition_count": 5} integration https://buildkite.com/redpanda/redpanda/builds/79988#019c2111-107d-4818-b610-05dba4ffa478 FLAKY 18/21 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0541, p0=0.2954, reject_threshold=0.0100. adj_baseline=0.1538, p1=0.3869, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TestReadReplicaService&test_method=test_identical_lwms_after_delete_records

@andrwng andrwng requested review from Lazin, dotnwat and rockwotj February 3, 2026 03:14
Copy link
Contributor

@Lazin Lazin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't introduce dependency on cloud_storage in cloud_topics. I think that the code reuse should be done by moving more general things like the remote_path_provider to cloud_io.

":cluster_services_interface",
":data_plane_impl",
":topic_manifest_upload_manager",
"//src/v/cloud_storage:remote",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not make cloud-topics depend on tiered-storage.

Copy link
Contributor Author

@andrwng andrwng Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's an unfortunate dependency, but it felt necessary if we want to tick the manifest uploads probe. Those feel like they don't belong in cloud_io, given cloud_io is meant to be RP-agnostic

Copy link
Contributor Author

@andrwng andrwng Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative is to have the uploader live in cloud_storage or somesuch, given there isn't really anything specific to cloud_topics here either, just the fact that we're only uploading for cloud_topics partitions.

EDIT: i take it back, we are relying on the cloud_topics manager for the leadership/properties notifications...

@Lazin
Copy link
Contributor

Lazin commented Feb 3, 2026

Maybe it makes sense to add a service that uploads topic manifests to the cloud/cloud_metadata? Run on every shard, subscribe to partition leadership changes, upload topic manifests in a completely isolated way.

// Manages topic manifest uploads for cloud topics partitions.
// Starts an upload loop when a partition becomes leader, stops when leadership
// is lost, and re-uploads when properties change.
class topic_manifest_upload_manager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be generalized to both TS and CT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having two places that upload topic manifests adds complexity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this generalizes to TS as well. I think longer term we can probably use this code for tiered storage too, but I would prefer to not block CT changes on changes to TS

@andrwng
Copy link
Contributor Author

andrwng commented Feb 3, 2026

Chatted a bit with Evgeny about this and decided that to avoid the dependency on cloud_storage, i'll add a new topic_manifest_io that doesn't depend on cloud_storage::remote

@andrwng andrwng force-pushed the ct-topic-manifest-uploader branch 3 times, most recently from ecee06f to cddae3d Compare February 4, 2026 20:36
@andrwng
Copy link
Contributor Author

andrwng commented Feb 4, 2026

Force pushes (compare):

  • the big thing is I removed the changes geared towards reusing the cloud_storage::remote in favor of making the topic manifest uploads using cloud_io::remote instead. This drastically cut down the changes to cloud_storage, though we still need the topic path utils and path provider, and a bit of state from the topic manifest.
  • reworked the manifest upload manager loop to use ssx::actor instead of a raw condition variable

@andrwng andrwng force-pushed the ct-topic-manifest-uploader branch from cddae3d to 638f079 Compare February 4, 2026 21:44
@andrwng
Copy link
Contributor Author

andrwng commented Feb 4, 2026

Ah i need to rebase to get some changes to ssx::actor

@andrwng andrwng force-pushed the ct-topic-manifest-uploader branch 2 times, most recently from da2e579 to 8034b6e Compare February 4, 2026 22:28
I'll have an upcoming change that will upload topic manifests outside of
cloud_storage. To that end, this pulls out the path provider utilities
into their own bazel library.
The underlying serializable state for topic manifests (for topics from
versions of Redpanda that serialize topic manifests with serde) can be
pulled out and reused by others who want to upload topic manifests (e.g.
cloud topics), without having to depend on all the base_manifest and
legacy parsing code in topic_manifest.cc. This commit pulls this out
into its own tiny header.
I intend on adding a loop for uploading topic manifests for cloud
topics. For tiered storage, this currently happens in the NTP archiver.

This commit duplicates some logic that is in the archiver for uploading
into its own little library. I'm opting to not touch the NTP archiver,
just since I don't want changes to cloud_topics to depend on changes to
cloud_storage and vice versa.
This adds an upload loop for topic manifests that runs only on cloud
topics' partitions 0. This functionality matches what is in the tiered
storage NTP archiver, which does the same thing in its main upload loop
on partition 0.

Note that there isn't really anything cloud topics specific in this
loop -- we could arguably use this same loop for tiered storage too.
That is left as future work.
Test for the recently added topic manifest uploader, that uploads a
topic manifest from partition 0 when there are leadership transfers or
topic property changes.
Review feedback. Reduces complexity a bit.
@andrwng andrwng force-pushed the ct-topic-manifest-uploader branch from 8034b6e to 0bbcc9a Compare February 4, 2026 22:43
@andrwng andrwng requested a review from Lazin February 5, 2026 01:40
Copy link
Contributor

@Lazin Lazin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


ss::future<> topic_manifest_upload_manager::reset_or_signal_loop(
model::topic_id_partition tidp,
ss::optimized_optional<ss::lw_shared_ptr<cluster::partition>> partition) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this method hold a gate? it can potentially run concurrently with 'stop'

OK, so the queue guarantees that this is not running concurrently with stop

@andrwng andrwng merged commit 70d49ac into redpanda-data:dev Feb 5, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants